Skip to content

Conversation

@szehon-ho
Copy link
Member

@szehon-ho szehon-ho commented Oct 29, 2021

Files metadata table was missing some columns, and also missing an example for partitioned tables in which the new partition column is added (users may want to know how to check what data files they have for a given partition)

@github-actions github-actions bot added the docs label Oct 29, 2021
@kbendick
Copy link
Contributor

cc @samredai as the docs refactor is underway 👍

| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0] | [] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null | [4] |
| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0] | [] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null | [4] |
+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+------------------+-----------------+-----------------+--------------+---------------+
+-------+-------------------------------------------------------------------------+-----------+---------------+------------+------------------+---------------------------+------------------------+------------------------+----------------+---------------------------------------+---------------------------------------+------------+-------------+------------+-------------+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to not revert the formatting changes? I think it is less readable with the initial space removed.

Maybe we should replace this with a real HTML table?

Here's a snippet of code that we use in our notebooks to format PySpark dataframes as nicer tables:

from prettytable import PrettyTable
from IPython.core.magic import register_line_cell_magic

class DFTable(PrettyTable):
    def __repr__(self):
        return self.get_string()

    def _repr_html_(self):
        return self.get_html_string()

def to_table(df, num_rows=100):
    cols = df.columns

    t = DFTable()
    t.field_names = cols
    t.align = "r"
    for row in df.limit(num_rows).collect():
        d = row.asDict()
        t.add_row([ d[col] for col in cols ])

    return t

That will produce both HTML and text tables that have reasonable formatting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me look at this.

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat unrelated, but I noticed that lower_bounds has no value for some keys. Is this possibly confusing for readers of the docs?

[1 -> , 2 -> c]

Notice that 1 doesn't point to anything. Is this intentional and do we think this might confuse readers? I can open a separate issue if so.

@rdblue
Copy link
Contributor

rdblue commented Nov 1, 2021

@kbendick, I think that we should convert lower and upper bounds into human-readable strings. Right now, I think we pass them to Spark as a map of id to binary.

@szehon-ho
Copy link
Member Author

szehon-ho commented Nov 5, 2021

@rdblue @kbendick @samredai i made a markdown table for files, if you guys click 'View File' in github it should show what it looks like. If you want i can extend this to the other ones too, or do it separately.

By the way i also found a possible problem in the spec_id column added in #3015 , it should probably get hidden just like partition column if table is unpartitioned, I could fix that in another PR.

@samredai
Copy link
Contributor

samredai commented Nov 6, 2021

GitHub injects some overflow css to add the scroll bar, if we switch to markdown tables we'd have to add that as well. Here's how the table currently would look:
Screen Shot 2021-11-05 at 9 22 44 PM

But adding the following:

extra.css

.markdown-table-container {
  width: 780px;
  overflow-x: scroll;
}

and then surrounding your markdown table with:

<div class="markdown-table-container" markdown="block">
...markdown table here...
</div>

will make it look like this with the scrollbar:
Screen Shot 2021-11-05 at 9 44 05 PM

@rdblue
Copy link
Contributor

rdblue commented Nov 7, 2021

+1 to adding the scroll bar. Thanks @samredai!

@rdblue
Copy link
Contributor

rdblue commented Nov 7, 2021

The markdown table looks good to me. The only problem is that now just this table is markdown. Anyone want to follow up with an update for the other tables?

@KnightChess
Copy link
Contributor

The markdown table looks good to me. The only problem is that now just this table is markdown. Anyone want to follow up with an update for the other tables?

@rdblue I can update other tables in this pr #3482

szehon-ho and others added 3 commits November 8, 2021 21:19
Files metadata table was missing some columns, and also missing an example for partitioned tables (users may want to know how to check what data files they have for a given partition)
@szehon-ho
Copy link
Member Author

Screenshot 2021-11-08 at 21 22 30

Thanks @samredai and @rdblue for review , added the scroll bar using div and css as suggested. Attached screenshot

@KnightChess
Copy link
Contributor

image
use scroll bar, there will be a empty line under the form when there is not overflow. did I not use it or optimize the scroll bar @szehon-ho

@szehon-ho
Copy link
Member Author

@KnightChess I'm not sure to be honest, can you put up your change on your pr? We can continue discussion on that pr instead of this one?

@samredai
Copy link
Contributor

@szehon-ho can you update the CSS to the below? If we're using this for all markdown tables, auto will hide the scrollbar space that @KnightChess is seeing when the table width is below the container width.

.markdown-table-container {
  width: 780px;
  overflow-x: auto;
}

@szehon-ho
Copy link
Member Author

Done, thanks @KnightChess for finding the issue that would occur for the other table, and @samredai for figuring it out

@szehon-ho
Copy link
Member Author

Hey @samredai @rdblue , would we able to continue moving this forward? Was out on vacation, but my understanding is that we still keep the markdown files in this repo?

@samredai
Copy link
Contributor

@szehon-ho yes the plan is to keep the markdown files in this repo. There are some changes coming soon but any merged docs PR will be included. This looks good to me, I'll let @rdblue comment on if it's good to merge.

@rdblue rdblue merged commit 40ae303 into apache:master Dec 14, 2021
@rdblue
Copy link
Contributor

rdblue commented Dec 14, 2021

Merged. Thanks, @szehon-ho!

@szehon-ho
Copy link
Member Author

Thanks for fast response !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants